point method
The Ball-Proximal (="Broximal") Point Method: a New Algorithm, Convergence Theory, and Applications
Gruntkowska, Kaja, Li, Hanmin, Rane, Aadi, Richtárik, Peter
Non-smooth and non-convex global optimization poses significant challenges across various applications, where standard gradient-based methods often struggle. We propose the Ball-Proximal Point Method, Broximal Point Method, or Ball Point Method (BPM) for short - a novel algorithmic framework inspired by the classical Proximal Point Method (PPM) (Rockafellar, 1976), which, as we show, sheds new light on several foundational optimization paradigms and phenomena, including non-convex and non-smooth optimization, acceleration, smoothing, adaptive stepsize selection, and trust-region methods. At the core of BPM lies the ball-proximal ("broximal") operator, which arises from the classical proximal operator by replacing the quadratic distance penalty by a ball constraint. Surprisingly, and in sharp contrast with the sublinear rate of PPM in the nonsmooth convex regime, we prove that BPM converges linearly and in a finite number of steps in the same regime. Furthermore, by introducing the concept of ball-convexity, we prove that BPM retains the same global convergence guarantees under weaker assumptions, making it a powerful tool for a broader class of potentially non-convex optimization problems. Just like PPM plays the role of a conceptual method inspiring the development of practically efficient algorithms and algorithmic elements, e.g., gradient descent, adaptive step sizes, acceleration (Ahn & Sra, 2020), and "W" in AdamW (Zhuang et al., 2022), we believe that BPM should be understood in the same manner: as a blueprint and inspiration for further development.
- North America > United States > New York (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Russia (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
An inexact Bregman proximal point method and its acceleration version for unbalanced optimal transport
Chen, Xiang, Wang, Faqiang, Liu, Jun, Cui, Li
The Unbalanced Optimal Transport (UOT) problem plays increasingly important roles in computational biology, computational imaging and deep learning. Scaling algorithm is widely used to solve UOT due to its convenience and good convergence properties. However, this algorithm has lower accuracy for large regularization parameters, and due to stability issues, small regularization parameters can easily lead to numerical overflow. We address this challenge by developing an inexact Bregman proximal point method for solving UOT. This algorithm approximates the proximal operator using the Scaling algorithm at each iteration. The algorithm (1) converges to the true solution of UOT, (2) has theoretical guarantees and robust regularization parameter selection, (3) mitigates numerical stability issues, and (4) can achieve comparable computational complexity to the Scaling algorithm in specific practice. Building upon this, we develop an accelerated version of inexact Bregman proximal point method for solving UOT by using acceleration techniques of Bregman proximal point method and provide theoretical guarantees and experimental validation of convergence and acceleration.
Stochastic Two Points Method for Deep Model Zeroth-order Optimization
Large foundation models, such as large language models, have performed exceptionally well in various application scenarios. Building or fully fine-tuning such large models is usually prohibitive due to either hardware budget or lack of access to backpropagation. The zeroth-order methods offer a promising direction for tackling this challenge, where only forward passes are needed to update the model. This paper introduces an efficient Stochastic Two-Point (S2P) approach within the gradient-free regime. We present the theoretical convergence properties of S2P under the general and relaxed smoothness assumptions. The theoretical properties also shed light on a faster and more stable S2P variant, Accelerated S2P (AS2P), through exploiting our new convergence properties that better represent the dynamics of deep models in training. Our comprehensive empirical results show that AS2P is highly effective in optimizing objectives for large deep models, including language models, and outperforms standard methods across various model types and scales, with 2 $\times$ speed-up in training over most conducted tasks.
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Michigan (0.04)
- Asia > Middle East > Jordan (0.04)
Robotic Dough Shaping
Ondras, Jan, Ni, Di, Deng, Xi, Gu, Zeqi, Zheng, Henry, Bhattacharjee, Tapomayukh
The ability to deform soft objects remains a great challenge for robots due to difficulties in defining the problem mathematically. In this paper, we address the problem of shaping a piece of dough-like deformable material into a 2D target shape presented upfront. We use a 6 degree-of-freedom WidowX-250 Robot Arm equipped with a rolling pin and information collected from an RGB-D camera and a tactile sensor. We present and compare several control policies, including a dough shrinking action, in extensive experiments across three kinds of deformable materials and across three target dough shape sizes, achieving the intersection over union (IoU) of 0.90. Our results show that: i) rolling dough from the highest dough point is more efficient than from the 2D/3D dough centroid; ii) it might be better to stop the roll movement at the current dough boundary as opposed to the target shape outline; iii) the shrink action might be beneficial only if properly tuned with respect to the expand action; and iv) the Play-Doh material is easier to shape to a target shape as compared to Plasticine or Kinetic sand. Video demonstrations of our work are available at https://youtu.be/ZzLMxuITdt4 Keywords: Robotics and Mechatronics, Machine Vision and Perception, Sensors and Actuators
- North America > United States (0.04)
- Asia (0.04)
Quantum algorithms for Second-Order Cone Programming and Support Vector Machines
Kerenidis, Iordanis, Prakash, Anupam, Szilágyi, Dániel
Convex optimization is one of the central areas of study in computer science and mathematical optimization. The reason for the great importance of convex optimization is twofold. Firstly, starting with the seminal works of Khachiyan [25] and Karmarkar [18], efficient algorithms have been developed for a large family of convex optimization problems over the last few decades. Secondly, convex optimization has many real world applications and many optimization problems that arise in practice can be reduced to convex optimization [8]. There are three main classes of structured convex optimization problems: linear programs (LP), semidefinite programs (SDP), and second-order conic programs (SOCP). The fastest (classical) algorithms for these problems belong to the family of interior-point methods (IPM). Interior point methods are iterative algorithms where the main computation in each step is the solution of a system of linear equations whose size depends on the dimension of the optimization problem. The size of structured optimization problems that can be solved in practice is therefore limited by the efficiency of linear system solvers - on a single computer, most open-source and commercial solvers can handle dense problems with up to tens of thousands of constraints and variables, or sparse problems with the same number of nonzero entries [30, 31]. In recent years, there has been a tremendous interest in quantum linear algebra algorithms following the breakthrough algorithm of Harrow, Hassidim and Lloyd [16].
- Asia > Middle East > Jordan (0.06)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (5 more...)
[Discussion] Areas of mathematics in physics in machine learning • r/MachineLearning
Have a look at fixed point functional analysis, which in physics is encountered in statistical physics and chaos theory. So far, fixed point analysis within machine learning is used for theoretical analysis of gradient methods (e.g. Non linear analysis and fixed point methods might lead to better descriptions of spatio-temporal invariance that seems to be important for better network models.
Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory
Richtárik, Peter, Takáč, Martin
We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix defining a norm, and an arbitrary discrete or continuous distribution over random matrices. Our reformulation has several equivalent interpretations, allowing for researchers from various communities to leverage their domain specific insights. In particular, our reformulation can be equivalently seen as a stochastic optimization problem, stochastic linear system, stochastic fixed point problem and a probabilistic intersection problem. We prove sufficient, and necessary and sufficient conditions for the reformulation to be exact. Further, we propose and analyze three stochastic algorithms for solving the reformulated problem---basic, parallel and accelerated methods---with global linear convergence rates. The rates can be interpreted as condition numbers of a matrix which depends on the system matrix and on the reformulation parameters. This gives rise to a new phenomenon which we call stochastic preconditioning, and which refers to the problem of finding parameters (matrix and distribution) leading to a sufficiently small condition number. Our basic method can be equivalently interpreted as stochastic gradient descent, stochastic Newton method, stochastic proximal point method, stochastic fixed point method, and stochastic projection method, with fixed stepsize (relaxation parameter), applied to the reformulations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York (0.04)
- (7 more...)
Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)
Wilson, Andrew Gordon, Nickisch, Hannes
We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of an inducing point approach depends on the number of inducing (aka interpolation) points, interpolation strategy, and GP covariance kernel. SKI also provides a mechanism to create new scalable kernel methods, through choosing different kernel interpolation strategies. Using SKI, with local cubic kernel interpolation, we introduce KISS-GP, which is 1) more scalable than inducing point alternatives, 2) naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability, without requiring any grid data, and 3) can be used for fast and expressive kernel learning. KISS-GP costs O(n) time and storage for GP inference. We evaluate KISS-GP for kernel matrix approximation, kernel learning, and natural sound modelling.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)